Reusing an FM-index

نویسندگان

  • Travis Gagie
  • Giovanni Manzini
  • Jouni Sirén
چکیده

Intuitively, if two strings S1 and S2 are sufficiently similar and we already have an FM-index for S1 then, by storing a little extra information, we should be able to reuse parts of that index in an FM-index for S2. We formalize this intuition and show that it can lead to significant space savings in practice, as well as to some interesting theoretical problems.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

FM-index for Dummies

The FM-index is a celebrated compressed data structure for full-text pattern searching. After the first wave of interest in its theoretical developments, we can observe a surge of interest in practical FM-index variants in the last few years. These enhancements are often related to a bit-vector representation, augmented with an efficient rankhandling data structure. In this work, we propose a n...

متن کامل

FM-KZ: An even simpler alphabet-independent FM-index

In an earlier work [6] we presented a simple FM-index variant, based on the idea of Huffman-compressing the text and then applying the Burrows-Wheeler transform over it. The main drawback of using Huffman was its lack of synchronizing properties, forcing us to supply another bit stream indicating the Huffman codeword boundaries. In this way, the resulting index needed O(n(H0+1)) bits of space b...

متن کامل

Based Specifications – reusing specifications , programs and proofs

The system has been designed for developing large interactive proofs. In particular, the GUI provides commands for reading and writing hierarchical proofs by letting the user focus on part of a proof. TLAPS uses a fingerprinting mechanism to store proof obligations and their status. It thus avoids reproving previously proved obligations, even after a model or a proof has been restructured, and ...

متن کامل

Re-engineering Based Feature Model Management for Software Product Line

Nowadays, Software Product Line Engineering (SPLE) is an emerging software engineering paradigm, which is based on the concept of reusing software artifacts gaining from the previous software development lifecycle. Researches concerning with domain analyzing, feature modeling (FM), common and variability analyzing processes have being developed for SPLE. So, this system proposes re-engineering ...

متن کامل

A bloated FM-index reducing the number of cache misses during the search

The FM-index is a well-known compressed full-text index, based on the Burrows–Wheeler transform (BWT). During a pattern search, the BWT sequence is accessed at “random” locations, which is cache-unfriendly. In this paper, we are interested in speeding up the FMindex by working on q-grams rather than individual characters, at the cost of using more space. The first presented variant is related t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1404.4814  شماره 

صفحات  -

تاریخ انتشار 2014